<!DOCTYPE HTML>
<html>

<head>
    <!-- Google tag (gtag.js) -->
    <script async src="https://www.googletagmanager.com/gtag/js?id=G-1TRKDN34JX"></script>
    <script>
        window.dataLayer = window.dataLayer || [];
        function gtag() { dataLayer.push(arguments); }
        gtag('js', new Date());

        gtag('config', 'G-1TRKDN34JX');
    </script>

    <link rel="preconnect" href="https://fonts.googleapis.com">
    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
    <link href="https://fonts.googleapis.com/css2?family=Source+Sans+3&display=swap" rel="stylesheet">

    <title>DEVA</title>

    <meta name="viewport" content="width=device-width, initial-scale=1">
    <!-- CSS only -->
    <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.1/dist/css/bootstrap.min.css" rel="stylesheet"
        integrity="sha384-+0n0xVW2eSR5OomGNYDnhzAbDsOXxcvSN1TPprVMTNDbiYZCxYbOOl7+AMvyTG2x" crossorigin="anonymous">
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>

    <link href="style.css" type="text/css" rel="stylesheet" media="screen,projection" />
</head>

<body>
    <br><br><br><br>
    <div class="container">
        <div class="row text-center" style="font-size:38px">
            <div class="col">
                Tracking Anything with Decoupled Video Segmentation
            </div>
        </div>

        <br>
        <div class="row text-center" style="font-size:28px">
            <div class="col">
                ICCV 2023
            </div>
        </div>
        <br>

        <div class="h-100 row text-center heavy justify-content-md-center" style="font-size:22px;">
            <div class="col-sm-auto px-lg-4">
                <a href="https://hkchengrex.github.io/">Ho Kei Cheng</a>
            </div>
            <div class="col-sm-auto px-lg-4">
                <nobr><a href="https://sites.google.com/view/seoungwugoh/">Seoung Wug Oh</a></nobr>
            </div>
            <div class="col-sm-auto px-lg-4">
                <nobr><a href="https://www.brianpricephd.com/">Brian Price</a></nobr>
            </div>
            <div class="col-sm-auto px-lg-4">
                <nobr><a href="https://www.alexander-schwing.de/">Alexander Schwing</a></nobr>
            </div>
            <div class="col-sm-auto px-lg-4">
                <nobr><a href="https://joonyoung-cv.github.io/">Joon-Young Lee</a></nobr>
            </div>
        </div>

        <br>

        <div class="h-100 row text-center justify-content-md-center" style="font-size:20px;">
            <div class="col-sm-2">
                <a href="https://arxiv.org/abs/2309.03903">[arXiv]</a>
            </div>
            <div class="col-sm-2">
                <a href="https://arxiv.org/pdf/2309.03903.pdf">[Paper]</a>
            </div>
            <div class="col-sm-2">
                <a href="https://github.com/hkchengrex/Tracking-Anything-with-DEVA">[Code & Demo]</a>
            </div>
            <div class="col-sm-2">
                <a href="https://colab.research.google.com/drive/1OsyNVoV_7ETD1zIE8UWxL3NXxu12m_YZ?usp=sharing">[Colab]</a>
            </div>
        </div>

        <br>

        <hr>

        <div class="row" style="font-size:32px">
            <div class="col">
                Abstract
            </div>
        </div>
        <br>
        <div class="row">
            <div class="col">
                <p style="text-align: justify;">
                    Training data for video segmentation are expensive to annotate.
                    This impedes extensions of end-to-end algorithms to new video segmentation tasks, especially in
                    large-vocabulary settings.
                    To 'track anything' without training on video data for every individual task, we develop a decoupled
                    video segmentation approach (DEVA), composed of task-specific image-level segmentation and
                    class/task-agnostic bi-directional temporal propagation.
                    Due to this design, we only need an image-level model for the target task (which is cheaper to
                    train) and a universal temporal propagation model which is trained once and generalizes across
                    tasks.
                    To effectively combine these two modules, we use bi-directional propagation for (semi-)online fusion
                    of segmentation hypotheses from different frames to generate a coherent segmentation.
                    We show that this decoupled formulation compares favorably to end-to-end approaches in several
                    data-scarce tasks including large-vocabulary video panoptic segmentation, open-world video
                    segmentation, referring video segmentation, and unsupervised video object segmentation.
                </p>
            </div>
        </div>

        <br>
        <hr>
        <br>
        
        <div class="row" style="font-size:32px">
            <div class="col">
                Demo with Grounded Segment Anything (text prompt: "guinea pigs" and "chicken"):
            </div>
        </div>
        <br>
        <center>
            <video style="width: 100%" controls>
                <source
                    src="https://user-images.githubusercontent.com/7107196/267217000-457a9a6a-86c3-4c5a-a3cc-25199427cd11.mp4"
                    type="video/mp4">
                Your browser does not support the video tag.
            </video>
            Source: <a href="https://youtu.be/FM9SemMfknA">https://youtu.be/FM9SemMfknA</a>
        </center>

        <br>
        <hr>
        <br>
        
        <div class="row" style="font-size:32px">
            <div class="col">
                Demo with Grounded Segment Anything (text prompt: "pigs"):
            </div>
        </div>
        <br>
        <center>
            <video style="width: 100%" controls>
                <source
                    src="https://user-images.githubusercontent.com/7107196/265595989-9a6dbcd1-2c84-45c8-ac0a-4ad31169881f.mp4"
                    type="video/mp4">
                Your browser does not support the video tag.
            </video>
            Source: <a href="https://youtu.be/FbK3SL97zf8">https://youtu.be/FbK3SL97zf8</a>
        </center>

        <br>
        <hr>
        <br>

        <div class="row" style="font-size:32px">
            <div class="col">
                Demo with Grounded Segment Anything (text prompt: "capybara"):
            </div>
        </div>
        <br>
        <center>
            <video style="width: 100%" controls>
                <source
                    src="https://user-images.githubusercontent.com/7107196/265596022-2ac5acc2-d160-49be-a013-68ad1d4074c5.mp4"
                    type="video/mp4">
                Your browser does not support the video tag.
            </video>
            Source: <a href="https://youtu.be/couz1CrlTdQ">https://youtu.be/couz1CrlTdQ</a>
        </center>

        <br>
        <hr>
        <br>

        <div class="row" style="font-size:32px">
            <div class="col">
                Demo with Segment Anything (automatic points-in-grid prompting); original video follows DEVA result overlaying the video:
            </div>
        </div>
        <br>
        <center>
            <video style="width: 100%" controls>
                <source
                    src="https://user-images.githubusercontent.com/7107196/265321745-ac6ab425-2f49-4438-bcd4-16e4ccfb0d98.mp4"
                    type="video/mp4">
                Your browser does not support the video tag.
            </video>
            Source: DAVIS 2017 validation set "soapbox"
        </center>

        <br>
        <hr>
        <br>

        <div class="row" style="font-size:32px">
            <div class="col">
                Demo with Segment Anything on a out-of-domain example; original video follows DEVA result overlaying the video:
            </div>
        </div>
        <br>
        <center>
            <video style="width: 100%" controls>
                <source
                    src="https://user-images.githubusercontent.com/7107196/265321764-48542bcd-113c-4454-b512-030df26def08.mp4"
                    type="video/mp4">
                Your browser does not support the video tag.
            </video>
            Source: <a href="https://youtu.be/FQQaSyH9hZI">https://youtu.be/FQQaSyH9hZI</a>
        </center>

        <br><br>

        <div style="font-size: 14px;">
            Contact: Ho Kei (Rex) Cheng (hkchengrex@gmail.com)
            <br>
        </div>

        <br><br>

    </div>

</body>

</html>
